CLMB: Deep Contrastive Learning for Robust Metagenomic Binning
نویسندگان
چکیده
The reconstruction of microbial genomes from large metagenomic datasets is a critical procedure for finding uncultivated populations and defining their functional roles. To achieve that, we need to perform binning, clustering the assembled contigs into draft genomes. Despite existing computational tools, most them neglect one important property data, that is, noise. further improve binning step reconstruct better metagenomes, propose deep Contrastive Learning framework Metagenome Binning (CLMB), which can efficiently eliminate disturbance noise produce more stable robust results. Essentially, instead denoising data explicitly, add simulated training force learning model similar representations both noise-free distorted data. Consequently, trained will be handle it implicitly during usage. CLMB outperforms previous state-of-the-art methods significantly, recovering near-complete on almost all benchmarking (up 17% reconstructed compared second-best method). It also improves performance bin refinement, reconstructing 8–22 high-quality 15–32 middle-quality than result. Impressively, in addition being compatible with refiner, single even recovers average 15 HQ refiner VAMB Maxbin datasets. On real mother-infant microbiome dataset 110 samples, scalable practical recover 365 (including 21 new ones), providing insights transmission. open-source available at https://github.com/zpf0117b/CLMB/ .
منابع مشابه
A robust and accurate binning algorithm for metagenomic sequences with arbitrary species abundance ratio
MOTIVATION With the rapid development of next-generation sequencing techniques, metagenomics, also known as environmental genomics, has emerged as an exciting research area that enables us to analyze the microbial environment in which we live. An important step for metagenomic data analysis is the identification and taxonomic characterization of DNA fragments (reads or contigs) resulting from s...
متن کاملMetagenomic reads binning with spaced seeds
Article history: Received 23 February 2017 Received in revised form 16 May 2017 Accepted 21 May 2017 Available online xxxx
متن کاملMetagenomic binning through low density hashing
Bacterial microbiomes of incredible complexity are found throughout the world, from exotic marine locations to the soil in our yards to within our very guts. With recent advances in Next-Generation Sequencing (NGS) technologies, we have vastly greater quantities of microbial genome data, but the nature of environmental samples is such that DNA from different species are mixed together. Here, we...
متن کاملLearning Deep Energy Models: Contrastive Divergence vs. Amortized MLE
We propose a number of new algorithms for learning deep energy models from data motivated by a recent Stein variational gradient descent (SVGD) algorithm, including a Stein contrastive divergence (SteinCD) that integrates CD with SVGD based on their theoretical connections, and a SteinGAN that trains an auxiliary generator to generate the negative samples in maximum likelihood estimation (MLE)....
متن کاملLearning Robust Deep Face Representation
With the development of convolution neural network, more and more researchers focus their attention on the advantage of CNN for face recognition task. In this paper, we propose a deep convolution network for learning a robust face representation. The deep convolution net is constructed by 4 convolution layers, 4 max pooling layers and 2 fully connected layers, which totally contains about 4M pa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2022
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-031-04749-7_23